Skip to content

perf(native): scope incremental rebuild to truly-changed files (#1012)#1027

Merged
carlos-alm merged 5 commits intomainfrom
perf/native-incremental-scope-1012
Apr 30, 2026
Merged

perf(native): scope incremental rebuild to truly-changed files (#1012)#1027
carlos-alm merged 5 commits intomainfrom
perf/native-incremental-scope-1012

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Native 1-file incremental was 4.3× slower than WASM (876ms vs 203ms) because run_pipeline re-parsed the full reverse-dep cone (47 files for a 1-file change), cascading into insert/structure/roles/analysis phases.
  • Adopt the WASM save+reconnect strategy: save reverse-dep → changed-file edges before purge, reconnect to new node IDs after Stage 5. Reverse-dep files are no longer re-parsed — only their affected edges are reconstructed.
  • Result: 876ms → 43ms (95% faster, 0.78× WASM).

Per-phase improvements

Phase Before After
insertMs 185ms 0.2ms
structureMs 73ms 3.1ms
rolesMs 311ms 18.3ms

Acceptance criteria (#1012)

  • Phases scale with truly-changed files (better than reverse-dep cone target)
  • Native within ≤1.5× WASM (now 0.78×)
  • Benchmark gate enforces ceiling — benchmark-parity-gate.mjs already checks 1-file incremental ratio at limit 1.5

Verification

  • Edge counts identical between full and 1-file rebuild (37134) — no edge loss/duplication
  • 181 Rust unit tests passed
  • 21 reverse-dep/incremental tests passed (incremental-edge-duplication, incremental-edge-parity, native-db-purge-reverse-deps, scoped-rebuild)
  • 18 broader integration tests passed (build-parity, incr-edge-gap, watcher-rebuild, cache, watcher-incremental)

Test plan

  • CI green (parity gate, benchmark)
  • Manual: run incremental-benchmark.ts to confirm timing

Native 1-file incremental was 4.3x slower than WASM (876ms vs 203ms) because
`run_pipeline` re-parsed the full reverse-dep cone (47 files for a 1-file
change), which cascaded into insert/structure/roles/analysis phases.

Adopt the WASM save+reconnect strategy: save reverse-dep → changed-file
edges before purge, reconnect to new node IDs after Stage 5. Reverse-dep
files are no longer re-parsed — only their affected edges are reconstructed.

Result: 876ms → 43ms (95% faster, 0.78x WASM).
- insertMs:    185ms → 0.2ms
- structureMs:  73ms → 3.1ms
- rolesMs:     311ms → 18.3ms

Edge counts identical between full and 1-file rebuild (37134), confirming
no edge loss or duplication. The existing `1-file incremental ratio` gate
in `benchmark-parity-gate.mjs` (limit 1.5) enforces the ceiling.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Native Incremental Rebuild Performance Optimization

Reviewing PR #1027 for performance improvements to native incremental rebuilds.

Todo List

  • Read and analyze changed files
  • Root cause analysis - understand the problem being solved
  • Check for configuration/test integrity issues
  • Code quality and security review
  • Backlog compliance assessment
  • Final recommendation with rating

Root Cause Analysis

Problem Being Solved: Native 1-file incremental rebuilds were 4.3× slower than WASM (876ms vs 203ms) because run_pipeline was re-parsing the full reverse-dependency cone (47 files for a 1-file change) and cascading this through all pipeline phases (insert/structure/roles/analysis).

Why This Approach: The PR adopts WASM's proven save+reconnect strategy: save reverse-dep → changed-file edges before purge, reconnect to new node IDs after Stage 5. This eliminates the need to re-parse reverse-dep files — only their affected edges are reconstructed. The approach is well-tested in the WASM pipeline and provides identical edge counts.

Risk Assessment: LOW RISK - This is adopting a proven strategy from the WASM engine rather than inventing new logic. The PR claims identical edge counts (37134) between full and incremental rebuilds, indicating no data loss.

Configuration/Test Integrity Analysis

✅ PASSED - No configuration changes detected that lower standards:

  • No linting rules disabled or test timeouts increased
  • No test expectations weakened or coverage thresholds lowered
  • Only minor Cargo.lock version bump (routine dependency update)

Test Coverage: Strong existing regression protection for this exact scenario:

  • incremental-edge-duplication.test.ts - prevents edge leakage (bug: incremental rebuild leaks ~249 duplicate edges per run #979)
  • incremental-edge-parity.test.ts - ensures identical edge counts vs full builds
  • regression-guard.test.ts - performance regression gates with 25% threshold
  • PR claims all tests pass: 181 Rust unit tests, 21 incremental tests, 18 integration tests

Code Quality Assessment

Architecture: Clean implementation following established patterns

  • save_reverse_dep_edges() / reconnect_reverse_dep_edges() in change_detection.rs mirror existing WASM logic
  • Pipeline orchestrator in build_pipeline.rs lines 183-327 uses clear save-before-purge, reconnect-after-insert flow
  • Proper error handling and transaction management throughout

Security: No security concerns identified

  • No unsafe Rust code or external command execution
  • Proper SQL parameterization prevents injection
  • No credential handling or external API calls

Performance: Implementation optimized for the stated goal

  • Scoped parsing to truly-changed files only (build_pipeline.rs:221-225)
  • Efficient edge reconstruction without full re-parse
  • Batch operations for database writes

Backlog Compliance Assessment

  • Zero-dep: ✓ — No new runtime dependencies, builds on existing tree-sitter/SQLite/rusqlite
  • Foundation-aligned: ✓ — Directly addresses P1 ("always-current") by making incremental rebuilds faster
  • Problem-fit: 5/5 — Directly reduces rebuild latency for AI agents, preventing token waste from stale graphs
  • Breaking: No — Internal optimization with identical edge count output, no API changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Native incremental builds were 4.3× slower than WASM due to re-parsing full reverse-dep cone
**Why This Approach**: Adopts proven WASM save+reconnect strategy to avoid reverse-dep re-parsing
**Risk Assessment**: Low risk - mirrors existing tested approach, maintains edge count parity

## Backlog Compliance
- Zero-dep: ✓ — no new dependencies
- Foundation-aligned: ✓ — improves P1 (always-current) performance  
- Problem-fit: 5 — directly reduces AI agent rebuild latency
- Breaking: No — internal optimization, identical output
- Tier: 1

## Critical Concerns
None identified. Clean implementation with strong test coverage and performance validation.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Excellent performance improvement (95% faster) using proven approach with comprehensive test coverage. Addresses core foundation principle of staying always-current. Minor deduction for lack of new test coverage specific to save+reconnect logic, but existing edge parity tests provide adequate protection.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR ports the WASM/JS incremental rebuild optimization to the native pipeline: instead of re-parsing every reverse-dep file in the changed-file cone, it saves the reverse-dep → changed-file edges before purge and reconnects them to new node IDs after Stage 5, cutting 1-file incremental time from 876 ms to 43 ms. It also fixes the role-classification gap for removal-only incremental builds (removal_reverse_deps seed) identified during the previous review round.

Confidence Score: 5/5

Safe to merge — no P0/P1 issues found; previous review threads are fully addressed.

The logic is well-reasoned and mirrors the proven WASM path. Edge save→purge→reconnect ordering is correct, deduplication via INSERT OR IGNORE is in place, the dropped-edge diagnostic is now logged, and the removal-only classification gap is patched. No new correctness or security issues identified.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/change_detection.rs Adds SavedReverseDepEdge, save_reverse_dep_edges, and reconnect_reverse_dep_edges; implementation correctly filters same-file self-edges, handles INSERT OR IGNORE no-ops in the reconnect counter, and wraps all reconnect inserts in a transaction.
crates/codegraph-core/src/build_pipeline.rs Replaces re-parse of reverse-dep cone with save+reconnect; adds removal_reverse_deps seed for role classification on file-deletion builds; correctly orders save before purge and reconnect after Stage 5/7.

Sequence Diagram

sequenceDiagram
    participant BP as build_pipeline
    participant CD as change_detection
    participant DB as SQLite

    Note over BP,DB: Incremental rebuild (new strategy)
    BP->>CD: save_reverse_dep_edges(changed_paths)
    CD->>DB: SELECT edges rdep→changed (pre-purge)
    DB-->>CD: SavedReverseDepEdge[]
    CD-->>BP: saved_edges

    BP->>CD: find_reverse_dependencies(removed_set) [if removals]
    CD->>DB: SELECT files that import removed files
    DB-->>CD: removal_reverse_deps

    BP->>CD: purge_changed_files(files_to_purge, &[])
    CD->>DB: DELETE nodes/edges for changed+removed files

    Note over BP,DB: Stage 4-5: parse & insert only changed files
    BP->>DB: parse_files_parallel(changed only)
    BP->>DB: do_insert_nodes (new node IDs)

    Note over BP,DB: Stage 7: rebuild edges for changed files
    BP->>DB: build_import_edges / build_call_edges

    BP->>CD: reconnect_reverse_dep_edges(saved_edges)
    CD->>DB: SELECT new node id by (name,kind,file,~line)
    CD->>DB: INSERT OR IGNORE edge (old_source_id, new_target_id)
    CD-->>BP: (reconnected, dropped)

    Note over BP,DB: Stage 8: classify roles
    BP->>DB: do_classify_incremental(changed_files + removal_reverse_deps)
Loading

Reviews (3): Last reviewed commit: "fix(native): reclassify reverse-deps of ..." | Re-trigger Greptile

Comment on lines +321 to +323
// Reconnect saved reverse-dep edges to new node IDs (#1012). Mirrors
// `reconnectReverseDepEdges` in build-edges.ts — for each saved edge,
// look up the new target node and recreate the edge with the original
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dropped edges silently discarded

reconnect_reverse_dep_edges returns (reconnected, dropped) but the return value is ignored here. Dropped edges occur when a target node can't be found after re-insert (e.g. a renamed or deleted symbol), and they're silently swallowed with no log output. In the normal "symbol still exists" path this is fine, but in practice this makes it very hard to diagnose partial edge loss during an incremental rebuild.

Suggested change
// Reconnect saved reverse-dep edges to new node IDs (#1012). Mirrors
// `reconnectReverseDepEdges` in build-edges.ts — for each saved edge,
// look up the new target node and recreate the edge with the original
if !saved_reverse_dep_edges.is_empty() {
let (reconnected, dropped) = change_detection::reconnect_reverse_dep_edges(conn, &saved_reverse_dep_edges);
if dropped > 0 {
eprintln!("[codegraph] reconnect_reverse_dep_edges: {reconnected} reconnected, {dropped} dropped (target nodes not found)");
}
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c268648reconnect_reverse_dep_edges return value is now captured and an eprintln! warning is emitted when dropped > 0 so partial edge loss is diagnosable. The normal happy path (no drops) stays quiet.

Comment on lines +473 to +485
Ok(new_id) => {
if insert_stmt
.execute(rusqlite::params![
s.source_id,
new_id,
&s.edge_kind,
s.confidence,
s.dynamic,
])
.is_ok()
{
reconnected += 1;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 reconnected counter inflated by INSERT OR IGNORE no-ops

insert_stmt.execute(…).is_ok() returns true for both a genuine insert (rows_affected() == 1) and a silently ignored duplicate (rows_affected() == 0, because INSERT OR IGNORE suppresses the constraint error). The counter will overcount when two saved entries resolve to the same (source_id, new_target_id, kind). Since the return value is only used for diagnostics this is low-impact, but it makes the (reconnected, dropped) log misleading.

Suggested change
Ok(new_id) => {
if insert_stmt
.execute(rusqlite::params![
s.source_id,
new_id,
&s.edge_kind,
s.confidence,
s.dynamic,
])
.is_ok()
{
reconnected += 1;
}
Ok(new_id) => {
match insert_stmt.execute(rusqlite::params![
s.source_id,
new_id,
&s.edge_kind,
s.confidence,
s.dynamic,
]) {
Ok(n) if n > 0 => reconnected += 1,
Ok(_) => {} // INSERT OR IGNORE: duplicate skipped
Err(_) => dropped += 1,
}
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2132d99 — replaced is_ok() with a match on rows_affected. Only Ok(n) if n > 0 increments reconnected; Ok(0) (the INSERT OR IGNORE no-op for duplicates) is now silently skipped instead of inflating the diagnostic counter. Genuine errors still increment dropped.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Codegraph Impact Analysis

3 functions changed0 callers affected across 0 files

  • run_pipeline in crates/codegraph-core/src/build_pipeline.rs:102 (0 transitive callers)
  • save_reverse_dep_edges in crates/codegraph-core/src/change_detection.rs:371 (0 transitive callers)
  • reconnect_reverse_dep_edges in crates/codegraph-core/src/change_detection.rs:439 (0 transitive callers)

INSERT OR IGNORE returns Ok(0) for duplicate-row no-ops, so the previous
`is_ok()` check inflated `reconnected` whenever two saved entries
resolved to the same (source_id, new_target_id, kind). Match on the
rows-affected count and only increment when n > 0; leave the dropped
counter for genuine errors. Diagnostic-only — no functional change to
edge data.

Impact: 1 functions changed, 0 affected
`reconnect_reverse_dep_edges` returns `(reconnected, dropped)` but the
caller previously discarded the result. Dropped edges occur when a saved
target node can't be re-located after re-insert (e.g. the symbol was
renamed or deleted), and silently swallowing them makes partial edge
loss during incremental rebuild hard to diagnose. Surface the count via
eprintln when dropped > 0 — same observability pattern other native
diagnostics use.

Impact: 1 functions changed, 0 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +196 to +202
let changed_paths: Vec<String> =
parse_changes.iter().map(|c| c.rel_path.clone()).collect();

if !opts.no_reverse_deps.unwrap_or(false) {
saved_reverse_dep_edges =
change_detection::save_reverse_dep_edges(conn, &changed_paths);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Removed files' reverse-dep roles not reclassified

changed_paths is built from parse_changes only — change_result.removed is excluded. save_reverse_dep_edges therefore never saves edges from files that import a removed file (call it B→A where A is deleted). Those edges are correctly purged when A's nodes are deleted. But because B never enters file_symbols, it also never enters changed_files, so do_classify_incremental never reclassifies it. B's fan-out silently decreases in the DB but its role record is stale.

In the old code changed_rel_paths included removed files, find_reverse_dependencies found B, B was re-parsed and landed in changed_files for reclassification. The new strategy intentionally skips re-parsing, but the classification gap for the removal case was not compensated. For a removal-only incremental build changed_files is empty, so neither do_classify_incremental nor do_classify_full is invoked, leaving every file that imported the deleted file with a permanently stale role until a future build touches them again.

A minimal fix: identify reverse-dep files of removed entries before purge (the existing find_reverse_dependencies with change_result.removed as seed would work), then union those file paths into the seed passed to do_classify_incremental. No re-parse is needed — reclassification reads fan-in/fan-out from the current (post-purge) edges table.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b6baec0 — applied your suggested approach: before purge, compute reverse-deps of change_result.removed via find_reverse_dependencies and union them into the seed passed to do_classify_incremental. No re-parse needed; the classifier reads fan-in/fan-out from the post-purge edges table.

This also covers the removal-only build case (where changed_files would otherwise be empty and classification skipped entirely). Verified the change doesn't regress: 181 Rust unit tests + 23 incremental integration tests + 15 roles tests all pass locally. The WASM side at detect-changes.ts:467 already includes ctx.removed in its findReverseDependencies seed, so this restores parity.

When a file is removed during incremental build, its nodes are purged
along with edges pointing at them. Files that imported the removed file
were never re-parsed (correct, by design) but their role records became
stale: fan-out dropped silently because do_classify_incremental was only
seeded with parsed files. For removal-only builds, classification was
skipped entirely.

Compute reverse-deps of removed entries before purge (find_reverse_deps
with change_result.removed as seed, mirroring WASM/JS) and union them
into the seed passed to do_classify_incremental. No re-parse needed —
the classifier reads fan-in/fan-out from the post-purge edges table.

Impact: 1 functions changed, 0 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit a724a09 into main Apr 30, 2026
31 checks passed
@carlos-alm carlos-alm deleted the perf/native-incremental-scope-1012 branch April 30, 2026 03:50
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant